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The principle goal of computational mechanics is to define pattern and structure so that the orga- 
nization of complex systems can be detected and quantified. Computational mechanics developed 
from efforts in the 1970s and early 1980s to identify strange attractors as the mechanism driving 
weak fluid turbulence via the method of reconstructing attractor geometry from measurement time 
series and in the mid-1980s to estimate equations of motion directly from complex time series. In 
providing a mathematical and operational definition of structure it addressed weaknesses of these 
early approaches to discovering patterns in natural systems. 

Since then, computational mechanics has led to a range of results from theoretical physics and 
nonlinear mathematics to diverse applications. The former include closed-form analysis of finite- and 
infinite-state Markov and non-Markov stochastic processes that are ergodic or nonergodic and their 
measures of information and intrinsic computation. The applications range from complex materials 
and deterministic chaos and intelligence in Maxwellian demons to quantum compression of classical 
processes and the evolution of computation and language. 

This brief review clarifies several misunderstandings and addresses concerns recently raised re- 
garding early works in the field (1980s). We show that misguided evaluations of the contributions of 
computational mechanics are groundless and stem from a lack of familiarity with its basic goals and 
from a failure to consider its historical context. For all practical purposes, its modern methods and 
results largely supersede the early works. This not only renders recent criticism moot and shows the 
solid ground on which computational mechanics stands but, most importantly, shows the significant 
progress achieved over three decades and points to the many intriguing and outstanding challenges 
in understanding the computational nature of complex dynamic systems. 
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I. GOALS 


The rise of dynamical systems theory and the matura- 
tion of the statistical physics of critical phenomena in the 
1960s and 1970s led to a new optimism that complicated 
and unpredictable phenomena in the natural world were, 
in fact, governed by simple, but nonlinearly interacting 
systems. Moreover, new mathematical concepts and in- 
creasingly powerful computers provided an entrée to un- 
derstanding how such phenomena emerged over time and 
space. The overarching lesson was that intricate struc- 
tures in a system’s state space amplify microscopic un- 
certainties, guiding and eventually attenuating them to 
form complex spatiotemporal patterns. In short order, 
though, this new perspective on complex systems raised 
the question of how to quantify their unpredictability and 
organization. 


By themselves, qualitative dynamics and statistical me- 
chanics were mute to this challenge. The first hints at 
addressing it lay in Kolmogorov’s (and contemporaries’) 
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introduction of computation theory [1-3] and Shannon’s 
information theory [4] into continuum-state dynamical 
systems [5-11]. This demonstrated that information had 
an essential role to play in physical theories of complex 
phenomena—a role as important as energy, but comple- 
mentary. Specifically, it led to a new algorithmic foun- 
dation to randomness generated by physical systems— 
behavior that cannot be compressed is random—and so 
a bona fide measure of unpredictability of complex sys- 
tems was established. 


Generating information, though, is only one aspect of 
complex systems. How do they store and process that in- 
formation? Practically, the introduction of information 
and algorithmic concepts side-stepped questions about 
how the internal mechanisms of complex systems are 
structured and organized. Delineating their informa- 
tional architecture was not addressed, for good reason. 
The task is subtle. 


Even if we know their governing mechanisms, complex 
systems (worth the label) generate patterns over long 
temporal and spatial scales. For example, the Navier- 
Stokes partial differential equations describe the local 
in time and space balance of forces in fluid flows. A 


static pressure difference leads to material flow. How- 
ever, despite the fact that any flow field is governed in- 
stantaneously by these equations of motion, the equa- 
tions themselves do not directly describe fluid structures 
such as vortices, vortex pairs, vortex streets, and vortex 
shedding, let alone turbulence [12]. When structures are 
generated at spatiotemporal scales far beyond those di- 
rectly specified by the equations of motion, we say that 
the patterns are emergent. 


Two questions immediately come to the fore about emer- 
gent patterns. And, this is where the subtlety arises [13]. 
We see that something new has emerged, but how do we 
objectively describe its structure and organization? And, 
more prosaically, how do we discover patterns in the first 
place? 


Refining the reconstruction methods developed to iden- 
tify chaotic dynamics in fluid turbulence [14, 15], compu- 
tational mechanics [16, 17] provided an answer that was 
as simple as it was complete: a complex system’s archi- 
tecture lies in its causal states. A causal state is a set of 
histories, all of which lead to the same set of futures. It’s 
a simple dictum: Do not distinguish histories that lead 
to the same predictions of the future. 


The causal states and the transition dynamic over them 
give a canonical representation—the e-machine. A sys- 
tem’s e-machine is its unique optimal predictor of mini- 
mal size [16, 18, 19]. The historical information stored 
in the causal states of a process quantifies how struc- 
tured the process is. A process’ e-machine is its effective 
theory—its equations of motions. One notable aspect of 
the e-machine construction is that focusing on how to 
optimally predict a process leads to a notion of struc- 
ture. Predictability and organization are inextricably in- 
tertwined. 


With a system’s e-machine minimal representation in 
hand, the challenge of quantifying emergent organization 
is solved. The answer lies in a complex system’s intrinsic 
computation [16] which answers three simple questions: 


1. How much of the past does a process store? 
2. In what architecture is that information stored? 


3. How is that stored information used to produce future 
behavior? 


The answers are direct: the stored information is that 
in the causal states; the process’ architecture is laid out 
explicitly by the e-machine’s states and transitions; and 
the production of information is the process’ Shannon 
entropy rate. 


At first blush it may not be apparent, but in this, compu- 
tational mechanics parallels basic physics. Physics tracks 


various kinds of energy and monitors how they can be 
transformed into one another. Computational mechan- 
ics asks, What kinds of information are in a system and 
how are they transformed into one another? Although 
the e-machine describes a mechanism that generates a 
system’s statistical properties, computational mechanics 
captures more than mere generation. And, this is how 
it was named: I wished to emphasize that it was an ex- 
tension of statistical mechanics that went beyond ana- 
lyzing a systems’ statistical properties to capturing its 
computation-theoretic properties—how a system stores 
and processes information, how it intrinsically computes. 


II. PROGRESS 


One might be concerned that this view of complex sys- 
tems is either not well grounded, on the one hand, or not 
practical, on the other. Over the last three decades, how- 
ever, computational mechanics led to a number of novel 
results from theoretical physics and nonlinear mathemat- 
ics that solidified its foundations to applications that at- 
test to its utility as a way to discover new science. Dis- 
coveries from over the last decade or so give a sense of 
the power of the ideas and methods, both their breadth 
and technical depth. 


Recent theoretical physics and nonlinear mathematics 
contributions include the following: 


e Continuum and nonergodic processes [20-27]; 

e Analytical complexity [28-33]; 

e Causal rate distortion theory [34-36]; 

e Synchronization and control [37-40]; 

e Enumerating memoryful processes [16, 41, 42]; 

e Crypticity and causal irreversibility [43-49]; 

e Bayesian structural inference [50-52]; 

e Input-output systems [53]; 

e Complexity of prediction versus generation [54]; 

e Predictive features and their dimensions [55]; 

e Sufficient statistics from effective channel states [56]; 
e Equivalence of history and generator e-machines [57]; 
e Informational anatomy [58]; and 

e Automated pattern detection [59]. 


Recent applications of computational mechanics include 
the following: 


e Complex materials [32, 33, 60-62]; 

e Stochastic thermodynamics [63-71]; 

e Information fluctuations [72, 73]; 

e Information creation, destruction, and storage [74]; 
e Spatiotemporal computational mechanics [75, 76]; 


e Quantum mechanics [77-81]; and 
e Evolution [82-86]. 


Staying true to our present needs, this must leave out 
detailed mention of a substantial body of computational 
mechanics research by others—a body that ranges from 
quantum theory and experiment [87-89] and stochastic 
dynamics [90-95] to spatial [96-101] and social systems 
[102]. 


Ill. HISTORY 


What’s lost in listing results is the intellectual history 
of computational mechanics. Where did the ideas come 
from? What is their historical context? What problems 
drove their invention? Revisiting the conditions from 
which computational mechanics emerged shows that as- 
pects of this history resonate with the science that fol- 
lowed. 


My interests started as a fascination with mainframe com- 
puters in the 1960s and with information theory in the 
1970s. I worked for a number of years in Silicon Valley, for 
IBM at what was to become its Almaden Research Cen- 
ter on information storage technology—magnetic bubble 
devices—and at Xerox’s Palo Alto Research Center— 
which at the time was busily inventing our current com- 
puting environment of packet-based networks (ethernet), 
internet protocols, graphical user interfaces, file servers, 
bitmap displays, mice, and personal workstations. An 
active member of the Homebrew Computer Club, I built 
a series of microcomputers—4-bit, 8-bit, and eventually 
16-bit machines. There, I met many technology buffs, 
several who later become titans of modern information 
technology. I suggested and then helped code up the first 
cellular automaton simulator on a prototype 6502 (8-bit) 
microcomputer, which would become the Apple I. 


As a college student at the University of California, Santa 
Cruz (UCSC), I learned about the mathematics of com- 
puters and communication theory directly from the in- 
formation theory pioneer David Huffman. Huffman, in 
particular, was well known for his 1950s work on minimal 
machines—on what was called machine synthesis. His pi- 
oneering work was an integral part of his discrete mathe- 
matics and information theory courses. Harry Huskey, 
one of the engineers on the first US digital comput- 
ers (ENIAC and EDVAC) also taught at UCSC and I 
learned computer architecture from him. In short, think- 
ing about computing and its physical substrates went 
hand in hand with my physics training in statistical me- 
chanics and mathematics training in dynamical systems 
theory. This theme drove the bulk of my research on 
chaotic dynamics. 


With this background in mind, let me turn to address 
what were the immediate concerns of nonlinear physics 
in the 1980s. As computers reduced in size and cost, they 
became an increasingly accessible research tool. In the 
late 1970s and early 1980s it was this revolution that led 
to the burgeoning field of nonlinear dynamics. In con- 
trast with abstract existence proofs, through computer 
simulations we could simply look at and interact with 
the solutions of complex nonlinear systems. In this way, 
the new tools revealed, what had been relatively abstract 
mathematics through most of the 20" century, a new uni- 
verse of exquisitely complex, highly ramified structures 
and unpredictable behaviors. 


Randomness emerged spontaneously, though paradoxi- 
cally we knew (and had programmed) the underlying 
equations of motion. This presented deep challenges. 
What is randomness? Can we quantify it? Can we ex- 
tract the underlying equations of motion from observa- 
tions? Soberingly, was each and every nonlinear system, 
in the vast space of all systems, going to require its own 
“theory”? The challenge, in essence, was to describe the 
qualitative properties of complex systems without getting 
bogged down in irrelevant explicit detail and microscopic 
analysis. How to see the structural forest for the chaotic 
trees? 


In the 1970s a target problem to probe these ques- 
tions was identified by the nonlinear physics community— 
fluid turbulence—and a testable hypothesis—the Ruelle- 
Takens conjecture that strange attractors were the inter- 
nal mechanism driving it [103]. This formalized an ear- 
lier proposal—‘“deterministic nonperiodic flow”—by the 
meteorologist Lorenz [104]: nonlinear instability was re- 
sponsible for the unpredictability of weather and fluid 
turbulence generally. 


There was a confounding problem, though. On the one 
hand, we had time series of measurements of the fluid 
velocity at a point in a flow. On the other, we had the 
abstract mathematics of strange attractors—complicated 
manifolds that circumscribed a system’s instability. How 
to connect them? This was solved by the proposals to use 
the measured time series to “reconstruct” the system’s ef- 
fective state space. This was the concept of extracting 
the attractor’s “geometry from a time series” (1980-81) 
[14, 15]. These reconstruction methods created an effec- 
tive state space in which to look at the chaotic attractors 
and to quantitatively measure their degree of instability 
(Kolmogorov-Sinai entropy and Lyapunov characteristic 
exponents) and their attendant complicatedness (embed- 
ding and fractal dimensions). This was finally verified 
experimentally in 1983 [105], overthrowing the decades- 
old Landau-Lifshitz multiple incommensurate-oscillator 
view of turbulence. 


Reconstructing a chaotic attractor from a time series be- 
came a widely used technique for identifying and quanti- 
fying deterministic chaotic behavior, leading to the field 
of nonlinear time series modeling [106]. 


Reconstruction, however, fell short of concisely express- 
ing a system’s internal structure. Could we extend re- 
construction to extract the system’s very equations of 
motion? A substantial benefit would be a robust way to 
predict chaotic behavior. The answer was provided in a 
method to reconstruct “Equations of Motion from a Data 
Series” [107, 108]. 


This worked quite well, when one happened to choose a 
mathematical representation that matched the class of 
nonlinear dynamics generating the behavior. But as Ref. 
[107] demonstrated in 1987, if you did not have the cor- 
rect representational “basis” it not only failed miserably, 
it also did not tell you how and where to look for a bet- 
ter basis. Thus, even this approach to modeling complex 
systems had an inherent subjectivity in the choice of rep- 
resentation. Structural complexity remained elusive. 


How to remove this subjectivity? The answer was pro- 
vided by pursuing a metaphor to the classification scheme 
for automata developed in discrete computation theory 
[3, 109, 110]. There, the mathematics of formal languages 
and automata had led in the 1950s and 1960s to a struc- 
tural hierarchy of representations that went from devices 
that used finite memory to infinite memories organized in 
different architectures—tapes, stacks, queues, counters, 
and the like. 


Could we do this, not for discrete bit strings, but con- 
tinuous chaotic systems? Answering this question led 
directly to computational mechanics as laid out in 1989 
by Ref. [16]. The answer turned on a predictive equiv- 
alence relation developed from the geometry-of-a-time- 
series concept of reconstructed state [14] and adapted 
to an automata-theoretic setting. The equivalence rela- 
tion gave a new kind of state that was a distribution 
of futures conditioned on past trajectories in the recon- 
structed state space. These were the causal states and 
the resulting probabilistic automata were e-machines. In 
this way, many of the notions of information processing 
and computing could be applied to nonlinear physics. 


IV. MISDIRECTION 


The preceding history introduced the goals of computa- 
tional mechanics, showed its recent progress, and put its 
origins in the historical context of nonlinear dynamics of 
complex systems, such as fluid turbulence. As we will 
now see, the original history and recent progress form 
a necessary backdrop for some distracting, but pressing 


business. 


It is abundantly clear at this point that the preceding 
overview is not a literature review on intrinsic computa- 
tion embedded in complex systems. Such a review would 
be redundant since reviews and extensive bibliographies 
that cite dozens of active researchers have been provided 
elsewhere and at semi-regular intervals since Ref. [16] 
(1989); see, e.g., Refs. [17-19, 111, 112]. Rather, the 
preceding is provided as a narrative synopsis of its moti- 
vations, goals, and historical setting. After three decades 
of extensive work by many researchers in computational 
mechanics, why is this necessary? The reason is that 
critiques appeared recently that concern computational 
mechanics publications from the 1980s and 1990s—that 
is, works that are two and three decades old. And so, the 
early history and recent progress is a necessary backdrop. 


The following addresses the issues raised and explains 
that, aside from several interesting, detailed mathemat- 
ical issues, they are in large measure misguided. They 
are based on arguments that selectively pick details, ei- 
ther quoting them out of context or applying inappro- 
priate contexts of interpretation. As presented, they are 
obscured technically so that expertise is required to eval- 
uate the arguments. In other cases, the issues raised are 
not criticisms at all—they are already well known. The 
following (A) reviews the issues and offers a broad re- 
sponse that shows they are misguided at best and (B) 
highlights the rhetorical style of argumentation, which 
shows that the nontechnical (in some cases, ad hominem) 
arguments rely on fundamental errors of understanding. 
After reviewing all of them carefully, we cannot find any 
concern that would lead one to question the very solid 
and firm grounding of computational mechanics. 


A. Technical Contentions 


As analysis tools, e-machines are defined and used in two 
different ways. In the first they are defined via the predic- 
tive equivalence relation over sequences, as already dis- 
cussed and as will be detailed shortly; these are history 
e-machines. In the second, e-machines are defined as 
predictive generators of processes; these are generator 
e-machines. (Mathematically, they are unifilar hidden 
Markov models with probabilistically distinct states that 
generate a given process.) The definitions are comple- 
mentary. In the first, one goes from a given process to 
its e-machine; in the second, one specifies an ¢-machine 
to generate a given process. Importantly, the definitions 
are equivalent and this requires a nontrivial proof [57]. 
The criticisms concern history ¢-machines and so we need 
focus only on them. The computational mechanics of 
e-machine generators is not at issue. 


Reference [113] raises technical concerns regarding sta- 
tistical estimation of finite-state and probabilistic finite 
state machines, as discussed in several-decades-old com- 
putational mechanics publications; principally two from 
1989 and 1990: Refs. [16] and [114], respectively. 


The simplest response is that almost all of the con- 
cerns have been superseded by modern computational 
mechanics: mixed-state spectral decomposition [28, 29] 
and Bayesian structural inference and e-machine enumer- 
ation methods [41, 52]. The view from the present is that 
the issues are moot. 


That said, when taken at face value, the bulk of the issues 
arise from technical misinterpretations. Largely, these 
stem from a failure to take into account that computa- 
tional mechanics introduced and regularly uses a host 
of different equivalence relations to identify related, but 
different kinds of state. Ignoring this causes confusion. 
Specifically, it leads to Ref. [113]’s misinterpretations of 
covers and partitions of sequence space, transient versus 
recurrent causal states, the vanishing measure of nonsyn- 
chronizing sequences, and an e-machine’s start state. It 
also leads to a second confusion over various machine 
reconstruction methods. Let’s take these two kinds of 
misunderstanding in turn. 


Effective states and equivalence relations One of compu- 
tational mechanics’ primary starting points is to identify 
a stochastic process’ effective states as those determined 
by an equivalence relation. Said most simply, group pasts 
that lead to the same distribution of futures. Colloqui- 
ally: do not make distinctions that do not help in pre- 
diction. The equivalence relation ~ connects two pasts 
t-ko = -K-t and -gho = L_K...@_1, if the 
future Xo., = Xo... Xz—1ı after having seen each looks 
the same: 


2K ~ T-ko & Pr(Xo:L|£-x:0) = Pr(Xo:L|x-x:0) . 


Taking finite or infinite pasts and futures and those of 
equal or unequal lengths defines a family of equivalence 
relations and so of different kinds of causal state. 


“Inferring Statistical Complexity” (1989) focused on de- 
termining a process’ long-term memory and so used 
K,K' => œ and L + o [16]. That is, it worked 
with a process’ recurrent causal states, defining the pro- 
cess’ statistical complexity as the amount of information 
they store. This and later works also used finite pasts 
(K, K’ € {0,1,2,3,...}) and infinite futures (L — oo) to 
define causal states more broadly. This introduced the 
notion of transient causal states. In turn, they suggested 
the more general notion of mixed states that monitor how 
an observer comes to know a process’ effective states— 
how the observer synchronizes to a process. And, finally, 


in this regime one has the subtree reconstruction method 
that merges candidate states with different-length pasts. 
The mixed states are critical to obtaining closed-form 
expressions for a process’ information measures [29-31]. 
This setting also introduces the notion of an e-machine’s 
start state—the effective state the process is in, having 
a correct model in hand, but having made no measure- 
ments: K, K’ = 0. Similarly, later works used infinite 
pasts and finite-length futures. Finally, using pasts and 
futures of equal length, but increasing them incremen- 
tally from zero leads to the class of causal-state splitting 
reconstruction methods [115]. 


Why all these alternatives? The answer is simple: each 
equivalence relation in the family poses a different ques- 
tion to which the resulting set of states is the answer 
or, at least, is an aid in answering. For example and 
somewhat surprisingly, Upper showed that even with in- 
finite pasts and futures and the induced recurrent causal 
states, there are elusive and unreachable states that are 
never observed [116]. More to the point, defining other 
kinds of state has been helpful in other ways, too. For 
example, to define and then calculate a process’ Markov 
and cryptic orders requires a different kind of transient 
state [37]. Analogously, very general convergence prop- 
erties of stochastic processes are proved by constructing 
the states of a process’ possibility machine [38, 39, 117]. 


With this flexibility in defining states, the mathematical 
foundations of computational mechanics give a broad set 
of analytical tools that tell one how a given process is 
organized, how it generates and transforms its informa- 
tion. Insisting on and using only one definition of causal 
state gives a greatly impoverished view of the structure 
of stochastic processes. Each kind is an answer to a dif- 
ferent question. Apparently, this richness and flexibility 
is a source of confusion. No surprise, therefore, that if a 
question of interest is misunderstood, then a given rep- 
resentation may appear wrong, when it is in fact correct 
for the task at hand. 


Reconstruction methods Reference [113] is unequivocal 
in its interpretation of machine reconstruction. It turns 
out there is little need to go into a detailed rebuttal of 
its statements, as they arise from a kind of misinterpre- 
tation similar to the misinterpretations discussed above. 
In short, Ref. [113] confuses a set of related, but distinct 


machine reconstruction methods. 


For example, sometimes one is interested in a representa- 
tion of the state machine that simply describes a process’ 
set of allowed realizations; that is, we are not interested in 
their probabilities, only which strings occur and which do 
not. This is the class of topological machine reconstruc- 
tion methods; the origins of which go back to the earliest 


days of the theory of computation—to David Huffman’s 
work. One can also, as a quick approximation, take a 
topologically reconstructed machine and have it read over 
a process’ sequence data and accumulate transition and 
state probabilities. This is a mixture of topological re- 
construction and empirical estimation. And, finally, one 
can directly estimate fully probabilistic e-machines via 
algorithms that implement the equivalence relation of in- 
terest. 


One can then use this range of reconstruction 
methods—topological, topological plus empirical, and 
probabilistic—with one or the other of the above equiva- 
lence relations. 


It is important to point out that these statistical meth- 
ods all have their weaknesses. That is, for a given re- 
construction algorithm implementation, one can design a 
process sample for which the implementation will behave 
misleadingly. For example, it has been known for some 
time that causal-state splitting reconstruction methods 
[115] often give machines with a diverging set of states, 
if one presents them with increasingly more data. This 
occurs due to its “determinization” step, which has an 
exponential state-set blow-up when converting an inter- 
mediate, approximate nondeterministic presentation to 
a deterministic (or unifilar) one. Analogously, the sub- 
tree reconstruction method suffers from “dangling states” 
in which inadequate data leads to improperly estimated 
future conditional distributions from which there is no 
consistent transition. This is not surprising in the least. 
Many arenas of statistical inference are familiar with such 
problems, especially when tasked to do out-of-class mod- 
eling. The theoretical sleight of hand one finds in math- 
ematical statistics is to assume data samples come from 
a known model class. For those interested in pattern dis- 
covery, this begs the question of what are patterns in the 
first place. 


Now, many such problems can be overcome in a theoret- 
ical or computational research setting by presenting the 
algorithms with a sufficient amount data. However, in 
a truly empirical setting with finite data, one must take 
care in their use. 


To address the truly empirical setting, these problems 
led us to introduce Bayesian Structure Inference for 
e-machines [52]. It relies on an exact enumeration of a set 
of candidate e-machines and related models [41]. It does 
not suffer from the above estimation problems in that it 
does not directly convert data to states and transitions as 
the above reconstruction algorithms do. Rather, it uses 
well-defined candidate models (e-machines) to estimate 
the probability that each produced the given data. It 
works well and is robust, even for very small data sets. 


That is, it is data parsimonious and relatively computa- 
tionally efficient. And, if one has extra knowledge (from 
theoretical or symmetry considerations) one needs only 
use a set of candidate models consistent with that knowl- 
edge. In many settings, this leads to markedly increased 
computational efficiency. 


To close this section, it is clear that one could spend an 
inordinate amount of time arguing which combination of 
the above equivalence relations and reconstruction meth- 
ods is “correct” and which is “incorrect”. This strikes me 
as unnecessarily narrow. The options form a toolset and 
those methods that produce consistent results, strength- 
ened by testing against known cases, yield important pro- 
cess properties. Practically, I recommend Bayesian Struc- 
tural Inference [52]. If I know a source will have low 
entropy rate and I want to see if it is structurally com- 
plex, though, I use probabilistic subtree reconstruction. 
I avoid causal-state splitting reconstruction. 


B. Rhetorical Diversions 


The preceding text offers a concise rebuttal to Ref. [113]’s 
claims by identifying their common flaws. The latter’s 
technical discussion, though, is embedded in a misleading 
rhetorical style. The import of this misdirection may be 
conveyed by analyzing two less technical points that are 
also presented with distracting emotion. 


The first is a misreading of the 1989 computational me- 
chanics publication, “Inferring Statistical Complexity”. 
The claim is that the title is grossly misleading since the 
article is not about statistical inference. This is an oddly 
anachronistic view of work published 30 years ago, which 
seems to require looking through the lens of our present 
Big Data era and the current language of machine learn- 
ing. 


Read dispassionately, the title does allude to “inferring”, 
which the dictionary says is “deducing or concluding (in- 
formation) from evidence and reasoning rather than from 
explicit statements”. And that, indeed, is how the article 
approaches statistical complexity—discovering patterns 
of intrinsic computation via the causal equivalence rela- 
tion. It not only defines statistical complexity, but also 
introduces the mathematics to extract it. Yes, the article 
is not statistical inference. The topic of statistical infer- 
ence as it is understood today was addressed in a number 
of later works; the most recent of which was mentioned 
above—“Bayesian Structural Inference for Hidden Pro- 
cesses” [52]. In short, the criticism is as specious as the 
rhetoric is distracting: the claim attributes anachronistic 
and inaccurate meanings to the article. 


The second nontechnical issue is developed following a 


similar strategy, and it also reveals a deep misunder- 
standing. Packard and I had studied the convergence 
properties of Shannon’s entropy rate [118-120] and along 
with Rob Shaw [121] had realized there was an important 
complexity measure—the past-future mutual information 
or excess entropy—that not only controlled convergence, 
but was on its own a global measure of process correla- 
tion. As those articles and Packard’s 1982 PhD disser- 
tation [122] point out this quantity was already used to 
classify processes in ergodic theory [123]. 


Given that excess entropy’s basic properties and alterna- 
tive definitions had been explored by then, Packard and 
I moved on to develop a more detailed scaling theory for 
entropy convergence, as one of the articles noted in its 
title “Noise Scaling of Symbolic Dynamics Entropies”. In 
this we defined the normalized excess entropy, which was 
normalized to its exact infinite-history, zero-noise value. 
This followed standard methods in phase transition the- 
ory to use “reduced” parameters. (A familiar example is 
the reduced temperature t = (T — T-)/T. normalized to 
vanish at the critical temperature Te at which the phase 
transition of interest occurs.) 


The complaint is that this definition is intentionally mis- 
leading since it is not the excess entropy. Indeed, it is not. 
The normalized excess entropy is a proxy for a single term 
in the excess entropy. And, the article is absolutely clear 
about its focus on scaling and the tools it employs. Once 
one does have a theory of how entropy convergence scales, 
in particular the convergence rate, then it is easy to back- 
out the excess entropy. A simple formula expresses the 
excess entropy in terms of that rate and the single-symbol 
entropy. 


So, this too is a toothless criticism, but it exemplifies the 
emotion and rhetorical style employed throughout Ref. 
[113]. The nontechnical and ad hominem criticisms in- 
tertwined with the technical faults are evidence of the 
consistent projection of irrelevant meanings onto the ma- 
terial. Once such an intellectually unproductive strategy 
is revealed, further rebuttal is unnecessary. 


V. FINAL REMARKS 


To summarize, computational mechanics rests on firm 
foundations—a solidity that led to many results over the 
last three decades, ranging from theoretical physics and 
nonlinear mathematics to diverse applications. It is a di- 
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including my own, in the 1970s and early 1980s to de- 
scribe the complex behaviors found in fluid turbulence. 


Reference [113]’s technical claims arise from a misunder- 
standing of computational mechanics’ goals, methods, 
successes, and history. Its rhetoric reveals a strategy 
of quoting out of context and reinterpreting decades-old 
work either without benefit of modern results or project- 
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matic conclusions on what is “correct” that follow from 
such a strategy are flawed. Moreover, Ref. [113]’s claims 
to precedence are based on false memories, are unsub- 
stantiated, and, in light of the history of events, are un- 
substantiatable. 


Current work simply eclipses the questions raised in dis- 
tant retrospect, rendering the criticisms moot. Time 
passes. We should let it move on. 


Over the years, computational mechanics has been 
broadly extended and applied, far beyond its initial con- 
ception 30 years ago. That said, its hope to lay the foun- 
dations of a fully automated “artificial science” [16]—in 
which theories are built automatically from raw data— 
remains a challenge. Though the benefits are tantalizing, 
it was and remains an ambitious goal. 
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